Search CORE

53 research outputs found

The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise

Author: Bi Yingtao
Jeske Daniel R.
Publication venue: Elsevier Inc.
Publication date: 31/08/2010
Field of study

AbstractIn many real world classification problems, class-conditional classification noise (CCC-Noise) frequently deteriorates the performance of a classifier that is naively built by ignoring it. In this paper, we investigate the impact of CCC-Noise on the quality of a popular generative classifier, normal discriminant analysis (NDA), and its corresponding discriminative classifier, logistic regression (LR). We consider the problem of two multivariate normal populations having a common covariance matrix. We compare the asymptotic distribution of the misclassification error rate of these two classifiers under CCC-Noise. We show that when the noise level is low, the asymptotic error rates of both procedures are only slightly affected. We also show that LR is less deteriorated by CCC-Noise compared to NDA. Under CCC-Noise contexts, the Mahalanobis distance between the populations plays a vital role in determining the relative performance of these two procedures. In particular, when this distance is small, LR tends to be more tolerable to CCC-Noise compared to NDA

Elsevier - Publisher Connector

Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes.

Author: Bi Yingtao
Davuluri Ramana V
Macyszyn Luke
O'Rourke Donald M
Pal Sharmistha
Showe Louise C
Publication venue: eScholarship, University of California
Publication date: 06/02/2014
Field of study

Molecular stratification of tumors is essential for developing personalized therapies. Although patient stratification strategies have been successful; computational methods to accurately translate the gene-signature from high-throughput platform to a clinically adaptable low-dimensional platform are currently lacking. Here, we describe PIGExClass (platform-independent isoform-level gene-expression based classification-system), a novel computational approach to derive and then transfer gene-signatures from one analytical platform to another. We applied PIGExClass to design a reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) based molecular-subtyping assay for glioblastoma multiforme (GBM), the most aggressive primary brain tumors. Unsupervised clustering of TCGA (the Cancer Genome Altas Consortium) GBM samples, based on isoform-level gene-expression profiles, recaptured the four known molecular subgroups but switched the subtype for 19% of the samples, resulting in significant (P = 0.0103) survival differences among the refined subgroups. PIGExClass derived four-class classifier, which requires only 121 transcript-variants, assigns GBM patients' molecular subtype with 92% accuracy. This classifier was translated to an RT-qPCR assay and validated in an independent cohort of 206 GBM samples. Our results demonstrate the efficacy of PIGExClass in the design of clinically adaptable molecular subtyping assay and have implications for developing robust diagnostic assays for cancer patient stratification

PubMed Central

eScholarship - University of California

Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles

Author: Bi Yingtao
Davuluri Ramana V.
Gupta Ravi
Kim Hyunsoo
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predictions. The increasing availability of TF enrichment profiles from recent ChIP-Seq methodology facilitates the investigation of dependent structure and accurate prediction of TFBSs. We develop a novel Tree-based PWM (TPWM) approach to accurately model the interaction between TF and its binding site. The whole tree-structured PWM could be considered as a mixture of different conditional-PWMs. We propose a discriminative approach, called TPD (TPWM based Discriminative Approach), to construct the TPWM from the ChIP-Seq data with a pre-existing PWM. To achieve the maximum discriminative power between the positive and negative datasets, the cutoff value is determined based on the Matthew Correlation Coefficient (MCC). The resulting TPWMs are evaluated with respect to accuracy on extensive synthetic datasets. We then apply our TPWM discriminative approach on several real ChIP-Seq datasets to refine the current TFBS models stored in the TRANSFAC database. Experiments on both the simulated and real ChIP-Seq data show that the proposed method starting from existing PWM has consistently better performance than existing tools in detecting the TFBSs. The improved accuracy is the result of modelling the complete dependent structure of the motifs and better prediction of true positive rate. The findings could lead to better understanding of the mechanisms of TF-DNA interactions

CiteSeerX

Public Library of Science (PLOS)

PubMed Central

Isoform level expression profiles provide better cancer signatures than gene level expression profiles

Author: Julia Tchou
Ramana V Davuluri
Sharmistha Pal
Yingtao Bi
ZhongFa Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Springer - Publisher Connector

IsoformEx: isoform level gene expression estimation using weighted non-negative least squares from mRNA-Seq data

Author: Bi Yingtao
Davuluri Ramana V
Gupta Ravi
Kim Hyunsoo
Pal Sharmistha
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background mRNA-Seq technology has revolutionized the field of transcriptomics for identification and quantification of gene transcripts not only at gene level but also at isoform level. Estimating the expression levels of transcript isoforms from mRNA-Seq data is a challenging problem due to the presence of constitutive exons. Results We propose a novel algorithm (IsoformEx) that employs weighted non-negative least squares estimation method to estimate the expression levels of transcript isoforms. Validations based on <it>in silico </it>simulation of mRNA-Seq and qRT-PCR experiments with real mRNA-Seq data showed that IsoformEx could accurately estimate transcript expression levels. In comparisons with published methods, the transcript expression levels estimated by IsoformEx showed higher correlation with known transcript expression levels from simulated mRNA-Seq data, and higher agreement with qRT-PCR measurements of specific transcripts for real mRNA-Seq data. Conclusions IsoformEx is a fast and accurate algorithm to estimate transcript expression levels and gene expression levels, which takes into account short exons and alternative exons with a weighting scheme. The software is available at <url>http://bioinformatics.wistar.upenn.edu/isoformex</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping

Author: Ramana V Davuluri
Segun Jung
Yingtao Bi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref

Recommended from our members

Identifying the substrate proteins of U-box E3s E4B and CHIP by orthogonal ubiquitin transfer

Author: Bhuripanyo Karan
Bi Yingtao
Chazin Walter J.
Chen Geng
Duong Duc
Kiyokawa Hiroaki
Liu Ruochuan
Liu Xianpeng
Seyfried Nicholas T.
Wang Yiyang
Yin Jun
Zhao Bo
Zhou Han
Zhou Li
Publication venue
Publication date: 12/02/2024
Field of study

E3 ubiquitin (UB) ligases E4B and carboxyl terminus of Hsc70-interacting protein (CHIP) use a common U-box motif to transfer UB from E1 and E2 enzymes to their substrate proteins and regulate diverse cellular processes. To profile their ubiquitination targets in the cell, we used phage display to engineer E2-E4B and E2-CHIP pairs that were free of cross-reactivity with the native UB transfer cascades. We then used the engineered E2-E3 pairs to construct “orthogonal UB transfer (OUT)” cascades so that a mutant UB (xUB) could be exclusively used by the engineered E4B or CHIP to label their substrate proteins. Purification of xUB-conjugated proteins followed by proteomics analysis enabled the identification of hundreds of potential substrates of E4B and CHIP in human embryonic kidney 293 cells. Kinase MAPK3 (mitogen-activated protein kinase 3), methyltransferase PRMT1 (protein arginine N-methyltransferase 1), and phosphatase PPP3CA (protein phosphatase 3 catalytic subunit alpha) were identified as the shared substrates of the two E3s. Phosphatase PGAM5 (phosphoglycerate mutase 5) and deubiquitinase OTUB1 (ovarian tumor domain containing ubiquitin aldehyde binding protein 1) were confirmed as E4B substrates, and b-catenin and CDK4 (cyclin-dependent kinase 4) were confirmed as CHIP substrates. On the basis of the CHIP-CDK4 circuit identified by OUT, we revealed that CHIP signals CDK4 degradation in response to endoplasmic reticulum stress

Knowledge UChicago

Distinct mechanisms control genome recognition by p53 at its target genes linked to different cell fates.

Author: Bi Yingtao
Davuluri Ramana V
Debler Erik W.
Farkas Marina
Hashimoto Hideharu
Manfredi James J.
McMahon Steven B.
Resnick-Silverman Lois
Publication venue: Jefferson Digital Commons
Publication date: 20/01/2021
Field of study

The tumor suppressor p53 integrates stress response pathways by selectively engaging one of several potential transcriptomes, thereby triggering cell fate decisions (e.g., cell cycle arrest, apoptosis). Foundational to this process is the binding of tetrameric p53 to 20-bp response elements (REs) in the genome (RRRCWWGYYYN0-13RRRCWWGYYY). In general, REs at cell cycle arrest targets (e.g. p21) are of higher affinity than those at apoptosis targets (e.g., BAX). However, the RE sequence code underlying selectivity remains undeciphered. Here, we identify molecular mechanisms mediating p53 binding to high- and low-affinity REs by showing that key determinants of the code are embedded in the DNA shape. We further demonstrate that differences in minor/major groove widths, encoded by G/C or A/T bp content at positions 3, 8, 13, and 18 in the RE, determine distinct p53 DNA-binding modes by inducing different Arg248 and Lys120 conformations and interactions. The predictive capacity of this code was confirmed in vivo using genome editing at the BAX RE to interconvert the DNA-binding modes, transcription pattern, and cell fate outcome

Jefferson Digital Commons

NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data

Author: A Mortazavi
A Oshlack
A Oshlack
AN Brooks
C Trapnell
Cancer Genome Atlas N
CX Mao
CX Mao
D Risso
EP Consortium
H Jiang
H Kim
HK Ji
J Feng
J Li
JC Marioni
JH Bullard
JPZ Wang
K Kadota
KD Hansen
L Shi
LM McIntyre
M Evans
MA Dillies
MA Van De Wiel
MD Robinson
MD Robinson
MD Robinson
N Leng
P Glaus
PJ Balwierz
PS Hammerman
Ramana V Davuluri
RC Gentleman
RD Canales
S Anders
S Anders
S Durinck
S Pal
S Pal
S Tarazona
S Zheng
SB Montgomery
TD Schmittgen
TJ Hardcastle
Yingtao Bi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise

Author: Bi Yingtao
Jeske Daniel R.
Publication venue
Publication date
Field of study

In many real world classification problems, class-conditional classification noise (CCC-Noise) frequently deteriorates the performance of a classifier that is naively built by ignoring it. In this paper, we investigate the impact of CCC-Noise on the quality of a popular generative classifier, normal discriminant analysis (NDA), and its corresponding discriminative classifier, logistic regression (LR). We consider the problem of two multivariate normal populations having a common covariance matrix. We compare the asymptotic distribution of the misclassification error rate of these two classifiers under CCC-Noise. We show that when the noise level is low, the asymptotic error rates of both procedures are only slightly affected. We also show that LR is less deteriorated by CCC-Noise compared to NDA. Under CCC-Noise contexts, the Mahalanobis distance between the populations plays a vital role in determining the relative performance of these two procedures. In particular, when this distance is small, LR tends to be more tolerable to CCC-Noise compared to NDA.Class noise Misclassification rate Misspecified model Asymptotic distribution

Research Papers in Economics